Using Hierarchies, Aggregates and Statistical models to discover Knowledge from Distributed Databases

نویسندگان

  • Rónán Páircéir
  • Sally McClean
  • Bryan Scotney
چکیده

Data Warehouses and statistical databases (Shoshani 1997) contain both numerical attributes (measures) and categorical attributes (dimensions). These data are often stored within a relational database with an associated hierarchical structure. There are few algorithms to date that explicitly exploit this hierarchical structure when carrying out knowledge discovery on such data. We look at a number of aspects of knowledge discovery from a set of databases distributed over the internet including the following: • Discovery of statistical relationships, rules and exceptions from hierarchically structured data which may contain heterogeneous and non-independent instances; • Use of aggregates as a set of sufficient statistics in place of base data for efficient model computation; • Leveraging the power of a relational database system for efficient computation of sufficient statistics; • Use of statistical metadata to aid distributed data integration and knowledge discovery.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Conceptual Clustering of Heterogeneous Distributed Databases

With increasingly more databases becoming available on the Internet, there is a growing opportunity to globalise knowledge discovery and learn general patterns, rather than restricting learning to specific databases from which the rules may not be generalisable. Clustering of distributed databases facilitates learning of new concepts that characterise common features of, and differences between...

متن کامل

Using Concept Hierarchies in Knowledge Discovery

In Data Mining, one of the steps of the Knowledge Discovery in Databases (KDD) process, the use of concept hierarchies as a background knowledge allows to express the discovered knowledge in a higher abstraction level, more concise and usually in a more interesting format. However, data mining for high level concepts is more complex because the search space is generally too big. Some data minin...

متن کامل

Text Modeling using Unsupervised Topic Models and Concept Hierarchies

Statistical topic models provide a general data-driven framework for automated discovery of highlevel knowledge from large collections of text documents. While topic models can potentially discover a broad range of themes in a data set, the interpretability of the learned topics is not always ideal. Human-defined concepts, on the other hand, tend to be semantically richer due to careful selecti...

متن کامل

Clustering classifiers for knowledge discovery from physically distributed databases

Most distributed classification approaches view data distribution as a technical issue and combine local models aiming at a single global model. This however, is unsuitable for inherently distributed databases, which are often described by more than one classification models that might differ conceptually. In this paper we present an approach for clustering distributed classifiers in order to d...

متن کامل

Clustering Algorithm for Large-Scale Databases

Clustering systems can discover intentional structures in data and extract new knowledge from a database. Many incremental and non-incremental clustering algorithms have been proposed, but they have some problems. Incremental algorithms work very efficiently, but their performance is strongly affected by the input order of instances. On the other hand, non-incremental algorithms are independent...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000